Starting as you mean to go on
29-Sep-2025
Which of these files contains the most recent version of the data?
$ ls -l data/
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadata_clean.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:37 sample_metadata.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadata_USE_THIS_ONE.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadataV2_final.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadataV2.tsvWhat makes a file name useful? Metadata
We discussed file naming conventions. What were folks’ take-aways?
Where should you look to find the latest version of protocol you’re interested in testing?
Our lab’s sharepoint is a good example of what not to do…
Which enzyme assay is the one you want?
Choose an organizational style; stick with it
If sharing, document the organizational style
Divide work into project directories.
Take home: Each project directory should be self-contained and hold all files needed to go from raw data to final results
What subdirectories do folks use?
What questions should you ask when creating a new subdirectory?
Conferences/ Conference presentations, trave administrative documents
Sean_qsip_tree/ Project file for creating a phylogenetic tree with Sean's qSIP project
Literature/ Relevant literature for ARCSS project (automatically integrated into Zotero/Mendeley libraries)
Senescence/ Project to identify likely senescence times for our sites
mimics_webapp/ Project for Stuart's hairbrained (but genius idea) to turn MIMICS into a webapp
Picarro Code/ Nacent code for processing Picarro outputs
useful_images/ Helpful images related to the project. Often useful in creating figures or presentations
Protocols/ Protocols related to lab work
Writing/ Writing folder; includes derived grants, manuscripts, etc.
qsip/ FICUS qsip project
Assembly-analysis/ Sub-analyses; files contain code, outputs, figures
cazyme_scraper/ Shortcut to a different project file, where I wrote a code pipeline
CN_versatility/ Sub-analyses; files contain code, outputs, figures
Core_microbiome/ Sub-analyses; files contain code, outputs, figures
data/ Raw data; files never edited; common across collaborators; contains both shortcuts to large data sets and actual files
general_climate_weather/ Sub-analyses; files contain code, outputs, figures
GraftM-analysis/ Collaborators's sub-analyses; I don't have to edit anything in here
identifying-outlier-years/ Sub-analyses; files contain code, outputs, figures
identify-temp-WTD-responders/ Sub-analyses; files contain code, outputs, figures
Metabolic-analysis/ Collaborators's sub-analyses; I don't have to edit anything in here
metadata_availability/ Sub-analyses; files contain code, outputs, figures
quantify_stability_with_time_figure/ Sub-analyses; files contain code, outputs, figures
SingleM-analysis/ Sub-analyses; files contain code, outputs, figures
setup.R Common analysis script that takes raw data and does initial cleaning
README.md Readme file; describes how to setup the code and data on your own computer
temporal_paper.yml Contains instructions for installing the software necessary for running all the code in the project
install_dependencies.sh Secondary installation script for software not covered by temporal_paper.yml
R/ Rscripts live here - they include documentation in the form of R-markdown
slurm/ slurm scripts for submitting to supercomputer live here
dada2_ernakovich.yml Installation and software information
README.md/ Tutorial information
.
├── README.md
├── analysis <- all things data analysis
│ └── src <- functions and other source files
├── comm
│ ├── internal_comm <- internal communication such as meeting notes
│ └── journal_comm <- communication with the journal, e.g. peer review
├── data
│ ├── data_clean <- clean version of the data
│ └── data_raw <- raw data (don't touch)
├── dissemination
│ ├── manuscripts
│ ├── posters
│ └── presentations
├── documentation <- documentation, e.g. data management plan
└── misc <- miscellaneous files that don't fit elsewhere
Project folders allow you to take advantage of coding and project management tools
Most IDEs (Integrated Development Environments, e.g. Rstudio) are set up to allow users to work in and switch easily between projects
git version tracking - For tracking your code and files, you set up version tracking in a project folder.
Sharing a project is easy - simply share the project folder with the collaborator
lab meeting on 10/27/25: . . .
We discussed directory structures
More Information: MIMARKS guidelines ~ MIMAG guidelines
We discussed metadata and README files
Good data habits can be implemented regardless of your experience or computational skill level
Today we’ll go through some check-lists you can use to help cultivate good data and coding habits